strong evidence
Early detection of disease outbreaks and non-outbreaks using incidence data
Gao, Shan, Chakraborty, Amit K., Greiner, Russell, Lewis, Mark A., Wang, Hao
Forecasting the occurrence and absence of novel disease outbreaks is essential for disease management. Here, we develop a general model, with no real-world training data, that accurately forecasts outbreaks and non-outbreaks. We propose a novel framework, using a feature-based time series classification method to forecast outbreaks and non-outbreaks. We tested our methods on synthetic data from a Susceptible-Infected-Recovered model for slowly changing, noisy disease dynamics. Outbreak sequences give a transcritical bifurcation within a specified future time window, whereas non-outbreak (null bifurcation) sequences do not. We identified incipient differences in time series of infectives leading to future outbreaks and non-outbreaks. These differences are reflected in 22 statistical features and 5 early warning signal indicators. Classifier performance, given by the area under the receiver-operating curve, ranged from 0.99 for large expanding windows of training data to 0.7 for small rolling windows. Real-world performances of classifiers were tested on two empirical datasets, COVID-19 data from Singapore and SARS data from Hong Kong, with two classifiers exhibiting high accuracy. In summary, we showed that there are statistical features that distinguish outbreak and non-outbreak sequences long before outbreaks occur. We could detect these differences in synthetic and real-world data sets, well before potential outbreaks occur.
- Asia > Singapore (0.34)
- North America > United States (0.28)
- Asia > China > Hong Kong (0.25)
- (22 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.72)
Balancing central and marginal rejection when combining independent significance tests
Salahub, Chris, Oldford, Wayne
A common approach to evaluating the significance of a collection of $p$-values combines them with a pooling function, in particular when the original data are not available. These pooled $p$-values convert a sample of $p$-values into a single number which behaves like a univariate $p$-value. To clarify discussion of these functions, a telescoping series of alternative hypotheses are introduced that communicate the strength and prevalence of non-null evidence in the $p$-values before general pooling formulae are discussed. A pattern noticed in the UMP pooled $p$-value for a particular alternative motivates the definition and discussion of central and marginal rejection levels at $\alpha$. It is proven that central rejection is always greater than or equal to marginal rejection, motivating a quotient to measure the balance between the two for pooled $p$-values. A combining function based on the $\chi^2_{\kappa}$ quantile transformation is proposed to control this quotient and shown to be robust to mis-specified parameters relative to the UMP. Different powers for different parameter settings motivate a map of plausible alternatives based on where this pooled $p$-value is minimized.
r/MachineLearning - [D] OpenAI releases GPT-2 1.5B model despite "extremist groups can use GPT-2 for misuse" but "no strong evidence of misuse so far".
We've seen no strong evidence of misuse so far They are going against their own word, but nevertheless, it's nice to see that they are releasing everything. EDIT: The unicorn example added below from https://talktotransformer.com/, which has already been updated with the newest 1.5B parameters model. Input: In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. Output: While there are only a few documented instances of unicorns in the wild, the researchers said the finding proves that there are still large numbers of wild unicorns that remain to be studied.
- South America > Argentina (0.07)
- North America > United States > Nevada > Washoe County > Reno (0.07)
- Media > News (0.53)
- Law Enforcement & Public Safety > Terrorism (0.40)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)
Network Classification and Categorization
Canning, James P., Ingram, Emma E., Nowak-Wolff, Sammantha, Ortiz, Adriana M., Ahmed, Nesreen K., Rossi, Ryan A., Schmitt, Karl R. B., Soundarajan, Sucheta
To the best of our knowledge, this paper presents the first large-scale study that tests whether network categories (e.g., social networks vs. web graphs) are distinguishable from one another (using both categories of real-world networks and synthetic graphs). A classification accuracy of $94.2\%$ was achieved using a random forest classifier with both real and synthetic networks. This work makes two important findings. First, real-world networks from various domains have distinct structural properties that allow us to predict with high accuracy the category of an arbitrary network. Second, classifying synthetic networks is trivial as our models can easily distinguish between synthetic graphs and the real-world networks they are supposed to model.
- North America > United States > Alabama > Tuscaloosa County > Tuscaloosa (0.15)
- North America > United States > New York > Onondaga County > Syracuse (0.05)
- North America > United States > Indiana > Porter County > Valparaiso (0.05)
- (3 more...)
- Information Technology (0.49)
- Health & Medicine > Therapeutic Area (0.30)